Extending data confidentiality into a player application

ABSTRACT

In a content protection scheme, and in response to a request for a content segment received by a server, the server generates and associates with the segment a message that confers entitlement to a session-specific key from which one or more decryption keys may be derived. The decryption keys are useful to decrypt the segment at runtime as it is about to be rendered by a player. Before delivery, the server encrypts the segment to generate an encrypted fragment, and it then serves the encrypted fragment (and the message) in response to the request. At the client, information in the message is used to obtain the session-specific key. Using that key, the decryption keys are derived, and those keys are then used to decrypt the received encrypted fragment. The decryption occurs at runtime. The approach protects content while in transit to and at rest in the client browser environment.

This application is based on and claims priority to Ser. No. 61/428,893,filed Dec. 31, 2010.

BACKGROUND OF THE INVENTION

1. Technical Field

This application relates generally to delivery online of high definition(HD) video at broadcast audience scale to popular runtime environmentsand mobile devices.

2. Brief Description of the Related Art

Distributed computer systems are well-known in the prior art. One suchdistributed computer system is a “content delivery network” or “CDN”that is operated and managed by a service provider. The service providertypically provides the content delivery service on behalf of thirdparties. A “distributed system” of this type typically refers to acollection of autonomous computers linked by a network or networks,together with the software, systems, protocols and techniques designedto facilitate various services, such as content delivery or the supportof outsourced site infrastructure. Typically, “content delivery” meansthe storage, caching, or transmission of content, streaming media andapplications on behalf of content providers, including ancillarytechnologies used therewith including, without limitation, DNS queryhandling, provisioning, data monitoring and reporting, contenttargeting, personalization, and business intelligence.

While content delivery networks provide significant advantages,typically they include dedicated platforms to support delivery ofcontent for multiple third party runtime environments that are, in turn,based on their own proprietary technologies, media servers, andprotocols. These distinct platforms are costly to implement and tomaintain, especially globally and at scale as the number of end usersincreases. Moreover, at the same time, content providers (such aslarge-scale broadcasters, film distributors, and the like) desire theircontent to be delivered online in a manner that complements traditionalmediums such as broadcast TV (including high definition or “HD”television) and DVD. This content may also be provided at different bitrates. End users also desire to interact with the content as they can donow with traditional DVR-based content delivered over satellite orcable. A further complication is that Internet-based content delivery isno longer limited to fixed line environments such as the desktop, asmore and more end users now use mobile devices such as the Apple®iPhone® to receive and view content over mobile environments.

Thus, there is a need to provide an integrated content delivery networkplatform with the ability to deliver online content (such as HD-qualityvideo) at broadcast audience scale to the most popular runtimeenvironments (such as Adobe® Flash Microsoft® Silveright®, Apple® iOS®,etc.) as well as to mobile devices such as the iPhone to match whatviewers expect from traditional broadcast TV. The techniques disclosedherein address this need.

BRIEF SUMMARY

A method of securing media extends data confidentiality into a playerapplication within the context of an integrated HTTP-based deliveryplatform that provides for the delivery online of HD-video and audioquality content to popular runtime environments operating on multipletypes of client devices in both fixed line and mobile environments. Thetechnique is designed to fend off scalable attacks, such as attackscaused by link/token sharing and automatic update services, key sharing,transport level content decryption, copy at rest, and the like, withoutresort to digital rights management (DRM). The approach protects thecontent while in transit to and at rest in the client browserenvironment.

In one embodiment, the content is served from an edge network server toa requesting client browser having a media player. In particular, and inresponse to a request for a segment of content that is received by theedge network server, the server generates and associates with thesegment an entitlement control message (ECM) that confers entitlement toa session-specific key from which one or more decryption keys areadapted to be derived. The segment of content typically represents atime slice of contiguous video and audio data of a configurable lengthat a discrete bitrate. Neither the session-specific key nor thedecryption keys need to be included in the message itself. Thedecryption keys are adapted for use to decrypt the segment at runtime asthe segment is about to be rendered by a client player. The edge networkserver encrypts the segment of content to generate an encryptedfragment, and it then serves the encrypted fragment (and the ECM, whichmay be embedded therein) in response to the request. At the client,information in the ECM (typically a token-protected URL) is used toobtain the session-specific key. Using that key, the one or moredecryption keys are derived, and the decryption keys are then used todecrypt the received encrypted fragment. The decryption occurs atruntime as the segment is about to be rendered by the player code.

The foregoing has outlined some of the more pertinent features of theinvention. These features should be construed to be merely illustrative.Many other beneficial results can be attained by applying the disclosedinvention in a different manner or by modifying the invention as will bedescribed.

BRIEF DESCRIPTION OF THE DRAWINGS

For a more complete understanding of the present invention and theadvantages thereof, reference is now made to the following descriptionstaken in conjunction with the accompanying drawings, in which:

FIG. 1 depicts an exemplary block diagram of a distributed computersystem environment in which exemplary aspects of the illustrativeembodiments may be implemented;

FIG. 2 is an exemplary block diagram of an edge server machine in thecontent delivery network in FIG. 1;

FIG. 3 illustrates an http-based content delivery architecture in whichthe secure media protection technique of this disclosure may beimplemented;

FIG. 4 shows an alternative embodiment of the content deliveryarchitecture in FIG. 3;

FIG. 5 illustrates an edge server-client browser interaction over whichthe protection technique of this disclosure operates;

FIG. 6 illustrates a preferred operation of the secure media protectiontechnique of this disclosure;

FIG. 7 is a UML diagram illustrating the various steps of the technique;

FIG. 8 illustrates a first “payload encryption” embodiment showing howthe ECM message is packaged;

FIG. 9 illustrates a second “content encryption” embodiment showing howthe ECM message is packaged for segmented data;

FIG. 10 illustrates a third embodiment showing how the ECM message ispackaged for MPEG2 Transport Streams; and

FIG. 11 illustrates an embodiment for HTTP dynamic streaming for use inan Adobe® Flash® runtime environment.

DETAILED DESCRIPTION

FIG. 1 illustrates a known distributed computer system that (asdescribed below) is extended by the techniques herein to provide asingle HTTP-based platform with the ability to deliver online HD videoat broadcast audience scale to the most popular runtime, environmentsand to the latest devices in both fixed line and mobile environments.

In this representative embodiment, a distributed computer system 100 isconfigured as a content delivery network (CDN) and is assumed to have aset of machines 102 a-n distributed around the Internet. Typically, mostof the machines are servers located near the edge of the Internet, i.e.,at or adjacent end user access networks. A network operations commandcenter (NOCC) 104 may be used to administer and manage operations of thevarious machines in the system. Third party sites, such as web site 106,offload delivery of content (e.g., HTML, embedded page objects,streaming media, software downloads, and the like) to the distributedcomputer system 100 and, in particular, to “edge” servers. Typically,content providers offload their content delivery by aliasing (e.g., by aDNS CNAME) given content provider domains or sub-domains to domains thatare managed by the service provider's authoritative domain name service.End users that desire such content may be directed to the distributedcomputer system to obtain that content more reliably and efficiently.Although not shown in detail, the distributed computer system may alsoinclude other infrastructure, such as a distributed data collectionsystem 108 that collects usage and other data from the edge servers,aggregates that data across a region or set of regions, and passes thatdata to other back-end systems 110, 112, 114 and 116 to facilitatemonitoring, logging, alerts, billing, management and other operationaland administrative functions. Distributed network agents 118 monitor thenetwork as well as the server loads and provide network, traffic andload data to a DNS query handling mechanism 115, which is authoritativefor content domains being managed by the CDN. A distributed datatransport mechanism 120 may be used to distribute control information(e.g., metadata to manage content, to facilitate load balancing, and thelike) to the edge servers.

As illustrated in FIG. 2, a given machine 200 in the CDN (sometimesreferring to herein as an “edge machine”) comprises commodity hardware(e.g., an Intel Pentium processor) 202 running an operating systemkernel (such as Linux or variant) 204 that supports one or moreapplications 206 a-n. To facilitate content delivery services, forexample, given machines typically run a set of applications, such as anHTTP proxy 207, a name server 208, a local monitoring process 210, adistributed data collection process 212, and the like. The HTTP proxy207 comprises a cache, together with a manager process (sometimesreferred to as a global host, or “ghost”) for managing the cache anddelivery of content from the edge machine. For streaming media, themachine typically includes one or more media servers, such as a WindowsMedia Server (WMS) or Flash 2.0 server, as required by the supportedmedia formats. When configured as a CDN “edge” machine (or “edgeserver”), the machine shown in FIG. 2 may be configured to provide oneor more extended content delivery features, preferably on adomain-specific, customer-specific basis, preferably using configurationfiles that are distributed to the edge servers using a configurationsystem. A given configuration file preferably is XML-based and includesa set of content handling rules and directives that facilitate one ormore advanced content handling features. The configuration file may bedelivered to the CDN edge server via the data transport mechanism. U.S.Pat. No. 7,111,057 illustrates a useful infrastructure for deliveringand managing edge server content control information and this and otheredge server control information (sometimes referred to as “metadata”)can be provisioned by the CDN service provider itself, or (via anextranet or the like) the content provider customer who operates theorigin server.

The CDN may include a storage subsystem (sometimes referred to herein as“Storage”), such as described in U.S. Pat. No. 7,472,178, the disclosureof which is incorporated herein by reference.

The CDN may operate a server cache hierarchy to provide intermediatecaching of customer content; one such cache hierarchy subsystem isdescribed in U.S. Pat. No. 7,376,716, the disclosure of which isincorporated herein by reference.

For live streaming delivery, the CDN may include a live deliverysubsystem, such as described in U.S. Pat. No. 7,296,082, the disclosureof which is incorporated herein by reference.

U.S. Ser. No. 12/858,177, filed Aug. 17, 2010, describes how theabove-identified technologies can be extended to provide an integratedHTTP-based delivery platform that provides for the delivery online ofHD-video quality content to the most popular runtime environments and tothe latest devices in both fixed line and mobile environments. Theplatform supports delivery of both “live” and “on-demand” content.

As described in Ser. No. 12/858,177, the following terms shall have thefollowing representative meanings. For convenience of illustration only,the description that follows (with respect to live streaming delivery)is in the context of the Adobe Flash runtime environment, but this isnot a limitation, as a similar type of solution may also be implementedfor other runtime environments both fixed line and mobile (including,without limitation, Microsoft Silverlight, Apple iPhone, and others).

An Encoder is a customer-owned or managed machine which takes some rawlive video feed in some format (streaming, satellite, etc.) and deliversthe data to an Entry Point encoded for streaming delivery. An EntryPoint (EP) typically is a process running on a CDN streaming machinewhich receives video data from the customer's Encoder and makes thisdata available to consumers of the live stream. For Adobe Flash, this isa Flash Media Server (FMS) configured to accept connections fromEncoders. A Flash Media Server is a server process for Flash mediaavailable from Adobe Corporation. In this embodiment, an IntermediateRegion (IR) typically is a Flash Media Server which the CDN hasconfigured to act analogously to a streaming set reflector, such asdescribed in U.S. Pat. No. 7,296,082 and U.S. Pat. No. 6,751,673. Thesemachines relay streams from FMS EPs to FMS Edge regions, providing fanout and path diversity. A “Region” typically implies a set of machines(and their associated server processes) that are co-located and areinterconnected to one another for load sharing, typically over aback-end local area network. A Flash Edge machine is a Flash MediaServer which has been configured to accept client requests. This is thesoftware running on the Flash EP, IR, and Edge machines in arepresentative embodiment. Intermediate Format (IF) is an internal (tothe CDN) format for sending streaming data from EP to an edge serverHTTP proxy. As will be described in more detail below, IF preferablycomprises several different pieces, including “Stream Manifest,”“Fragment Indexes,” and “IF Fragments.”Live, DVR and VOD are defined asfollows: “Live” refers to media served in real time as an event occurs;“DVR” refers to serving content acquired from a “live” feed but servedat a later time; “VOD” refers to media served from a single, complete(i.e., not incrementally changing) file or set of files. Real TimeMessaging Protocol (RTMP) is the streaming and RPC protocol used byFlash. Real Time Messaging Protocol Encrypted (RTMPE) is the encryptedversion of RTMP using secrets built into the server and client. “SWF” or“Small Web Format” is the format for Flash client applications. SWFverification refers to a technique by which the Flash Player canauthenticate to FMS that it is playing an unmodified SWF by sendinghashes of the SWF itself along with secrets embedded in the client andserver.

FIG. 3 illustrates an overview of the architecture for live streamingdelivery as described in Ser. No. 12/858,177, filed Aug. 17, 2010. Asseen in FIG. 3, the system generally is divided into two independenttiers: a stream recording tier 300, and a stream player tier 302. Therecording process (provided by the stream recording tier 300) isinitiated from the Encoder 304 forward. Preferably, streams are recordedeven if there are currently no viewers (because there may be DVRrequests later). The playback process (provided by the stream playertier 302) plays a given stream starting at a given time. Thus, a “livestream,” in effect, is equivalent to a “DVR stream” with a start time of“now.”

Referring to FIG. 3, the live streaming process begins with a streamdelivered from an Encoder 304 to an Entry Point 306. An RTMP Pullercomponent 308 (e.g., running on a Linux-based machine) in an EP Region(not shown) is instructed to subscribe to the stream on the EP 306 andto push the resulting data to one or more Archiver 310 processes,preferably running on other machines. As illustrated, one of theArchivers 310 may operate as the “leader” as a result of executing aleader election protocol across the archiving processes. Preferably, theArchivers 310 act as origin servers for the edge server HTTP proxyprocesses (one of which is shown at 312) for live or near-live requests.The edge server HTTP proxy 312 provides HTTP delivery to requesting enduser clients, one of which is the Client 314. A “Client” is a devicethat includes appropriate hardware and software to connect to theInternet, that speaks at least HTTP, and that includes a contentrendering engine. The Client device type will vary depending on whetherthe device connects to the Internet over a fixed line environment or amobile environment. A representative client is a computer that includesa browser, typically with native or plug-in support for media players,codecs, and the like. If DVR is enabled, content preferably is alsouploaded to the Storage subsystem 316, so that the Storage subsystemserves as the origin for DVR requests as will be described.

As also seen in FIG. 3, the content provider may choose to deliver twocopies of the stream, a primary copy, and a backup copy, to allow thestream to continue with minimal interruption in the event of network orother problems. Preferably, the primary and backup streams are treatedas independent throughout the system up through the edge server HTTPproxy, which preferably has the capability of failing over from theprimary to the backup when the primary is having difficulties, and viceversa.

A content request (from an end user Client 314) is directed to the CDNedge machine HTTP proxy 312, preferably using techniques such asdescribed in U.S. Pat. Nos. 6,108,703, 7,240,100, 7,293,093 and others.When an HTTP proxy 312 receives an HTTP request for a given stream, theHTTP proxy 312 makes various requests, preferably driven by HTTP proxymetadata (as described in U.S. Pat. Nos. 7,240,100, 7,111,057 andothers), possibly via a cache hierarchy 318 (see., e.g., U.S. Pat. No.7,376,716 and others) to learn about and download a stream to serve tothe Client 314. Preferably, the streaming-specific knowledge is handledby the edge machine HTTP proxy 312 directly connected to a Client 314.Any go-forward (cache miss) requests (issued from the HTTP proxy)preferably are standard HTTP requests. In one embodiment, the content isdelivered to the Client 314 from the HTTP proxy 312 as aprogressive-download FLV file. As noted above, the references herein toAdobe FLV are used herein by way of example, as the architecture shownin FIG. 3 is not limited for use with Adobe FLV. For secure streams,preferably the Client 314 first authenticates to the HTTP proxy 312using an edge server authentication technique and/or a SWF-verificationback-channel.

When a Client 314 requests a particular stream, the HTTP proxy 312 (towhich the client has been directed, typically via DNS) starts thestreaming process by retrieving a “Stream Manifest” that containspreferably only slowly changing attributes of the stream and informationneeded by the HTTP proxy to track down the actual stream content. TheURL to download this manifest preferably is constructeddeterministically from metadata delivered (e.g., via the distributeddata transport mechanism of FIG. 1) to the HTTP proxy. Preferably, themanifest itself is stored in association with a Stream Manifest Managersystem (not shown) and/or in the storage subsystem 316. Preferably, aStream Manifest describes the various “tracks” that compose a stream,where preferably each track constitutes a different combination of bitrate and type, where type is “audio,” “video,” or “interleaved_AV.” TheStream Manifest preferably includes a sequence of “indexInfo” timeranges for each track that describe forward URL templates, streamproperties, and various other parameters necessary for the HTTP proxy torequest content for that time range.

For “live” requests, the HTTP proxy starts requesting content relativeto “now,” which, in general, is approximately equal to the time on theedge machine HTTP proxy process. Given a seek time, the HTTP proxydownloads a “Fragment Index” whose name preferably is computed based oninformation in the indexInfo range and an epoch seek time. Preferably, aFragment Index covers a given time period (e.g., every few minutes). Byconsulting the Fragment Index, an “Intermediate Format (IF) Fragment”number and an offset into that fragment are obtained. The HTTP proxy canthen begin downloading the file (e.g., via the cache hierarchy 318, orfrom elsewhere within the CDN infrastructure), skipping data before thespecified offset, and then begin serving (to the requesting Client) fromthere. Preferably, the IF fragments are sized for optimal caching by theHTTP proxy. In general, and unless the Stream Manifest indicatesotherwise with a new indexInfo range, for live streaming the HTTP proxythen continues serving data from consecutively-numbered IF Fragments.

In the context of live HTTP-based delivery, the Intermediate Format (IF)describes an internal representation of a stream used to get data fromthe RTMP Puller through to the edge machine HTTP proxy. A “source”format (SF) is a format in which the Entry Point 306 provides contentand a “target” format (TF) is a format in which edge machine HTTP proxy312 delivers data to the Client 314. These formats need not be the same.Thus, SF may differ from TF, i.e., a stream may be acquired in FLVformat and served in a dynamic or adaptive (variable bit rate) format.The format is the container used to convey the stream; typically, theactual raw audio and video chunks are considered opaque data, althoughtranscoding between different codecs may be implemented as well. Bypassing the formats through the HTTP proxy (and delivering to the Clientvia conventional HTTP), the container used to deliver the content can bechanged as long as the underlying codecs are managed appropriately.

The above-described architecture is useful for live streaming,particularly over formats such as Flash. The platform can also be usedto support Video on demand (VOD). In particular, the solution canprovide VOD streaming from customer and Storage subsystem-based origins,provides single and multiple bitrate (SBR and MBR) streaming, providessupport for origin content stored in flv and mp4/flv containers(supported mp4/flv codes include, among others, AAC, MP3, PCM for audio,and H.264 for video), and minimizes download of content beyond what isdirectly requested by the end user.

For VOD delivery, the stream recorder tier 300 (of FIG. 3) is replaced,preferably with a translation tier. For VOD delivery using HTTP, theFragment Indexes may be generated from the origin content on-the-fly(e.g., by scanning FLV or parsing MP4 MOOV atoms) and caching theseindexes. Actual data retrievals may then be implemented as “partialobject caching” (POC) retrievals directly from source material at theedge region or via an intermediate translation (e.g., by a cache-hparent) into an Intermediate Format. Partial object caching refers tothe ability of an HTTP proxy to fetch a content object in fragments onlyas needed rather than downloading the entire content object. The HTTPproxy can cache these fragments for future use rather than having torelease them after being served from the proxy. An origin server fromwhich the content object fragments are retrieved in this manner mustsupport the use of HTTP Range requests.

As described in Ser. No. 12/858,177, filed Aug. 17, 2010, typically VODcontent is off-loaded for HTTP delivery to the CDN. In a firstembodiment, a conversion tool (a script) is used to convert sourcecontent flv to IF, with the resulting IF files then uploaded to theStorage subsystem. In this approach, metadata is used to have an HTTPproxy go forward to the Storage subsystem to retrieve the streammanifest, which then references the Storage subsystem for the remainingcontent. In this approach, files in mp4/flv are first converted to flv(e.g., using ffmpeg copy mode) to change the container to fly. Anotherapproach is to have a CDN customer upload raw media files to the Storagesubsystem and to run a conversion tool there. Yet another alternative isto have the customer (or encoder) produce content in IF directly.

The translation tier approach is described in Ser. No. 12/858,177, filedAug. 17, 2010. In this approach, an on-demand dynamic IF generatormachine takes requests for IF (manifests, indexes, and fragments) andsatisfies these requests by dynamically retrieving flv or mp4/f4v inputfile ranges (either from the Storage subsystem or customer origin). Fromthere, HTTP proxy treatment is essentially the same as the “conversiontool” options described above. The generator machine preferably runs itsown HTTP proxy (the “translator HTTP proxy”) to cache various inputs andoutputs, together with a translator process (described below) thataccepts requests (e.g., from a localhost connection to the translatorHTTP proxy) and generates IF based on data retrieved from the HTTP proxyvia an associated cache process. In an alternative, the translatorprocess may comprise part of the translator HTTP proxy, in which case IFgeneration takes place within the proxy. Fragment generation may also becarried out in an edge machine HTTP proxy or even further downstream(into the Client itself), such as where a Client maintains a sessionconnection with one or more peer clients.

An architecture and request flow of a VOD approach is shown in FIG. 4.In this embodiment, a translation tier 400 is located between an origin402 (e.g., customer origin, or the Storage subsystem, or both) and thestream player tier 404. In a representative embodiment, the translationtier executes in its own portion (e.g., a Microsoft IIS or equivalentnetwork) within the CDN, preferably in a Region dedicated to thispurpose. Alternatively, a translator (as described below) may run on asubset of HTTP-based edge machine Regions.

The above-described embodiments provide a format-agnostic streamingarchitecture that utilizes an HTTP edge network for object delivery. Theedge network may include one or more security mechanisms. In arepresentative implementation, as seen in FIG. 5, it is assumed that anend user machine 500 has an associated client browser 502 and a mediaplayer 504. Content to be rendered by the media player is deliverableover network 505 to the end user machine from an edge server machine 506running an http proxy process (sometimes referred to herein as “ghost”)508, such as described above. As will be described in more detail below,the proxy has the capability of encrypting the content, preferably undermetadata control. The player comprises a player component 510 (one ormore codecs) and a security module (the AUTH module) 512. This moduleincludes cryptographic hashing and encryption methods for the purpose ofidentifying the running player, and securely communicating the resultsto an edge server. The AUTH module preferably is a renewable player sideplugin that allows for a change in the hash-computing algorithm (KDF) inreal-time, without requiring any player patch to the player runtime. TheAUTH module, in the alternative, may be integral with the other playercode. As a security mechanism, the player code may be subject to a“player verification” operation, which provides a means for the edgeserver (more generally, the edge network) to verify the playerimplementation. Player verification prevents unauthorized players fromplaying protected content (typically through deep linking attacks). Aplayer verification mechanism ensures that a player and, optionally, itsresident AUTH module, are authentic. This is typically achieved byhashing (e.g., using MD5, SHA-1, or the like) the player and AUTH moduleto produce a message digest for verification by the edge server. Theplayer verification scheme also may test (periodically) the runningimage (on the client) for the presence of a security code, and it mayobfuscate the AUTH module. In addition, the edge network typicallyenforces an edge “token authorization” (or token auth) scheme thatprovides verification of end-user credentials and transfer of accesscontrol. Token authorization, as is well-known, uses a server-generatedhash of session information and access control parameters. The servertransfers a token to the player either in a URL query string or acookie. A token authorization scheme works to mitigate link sharing orhijacking attacks.

While token authorization and player verification provide usefuladvantages, it may be desired to provide additional content protectionmechanisms. One such technique, which is the subject of this disclosure,is now described.

Extending Confidentiality into a Player

The above-described network typically uses SSL to protect content as itstreams from an edge server ghost process to the player applicationresident in the browser. This use of SSL, while effective forE-commerce, was not designed to protect application data of interestfrom attack by the end user. Thus, data passing through memory on theend user's browser is unprotected. While there are exceptions, mostbrowsers today do not include secure memory or processors to offload thedecryption of streamed multimedia data. In addition, any applicationcapable of handling an anonymous SSL/TLS session can impersonate alegitimate browser for the single purpose of making unauthorized copies.To make matter worse, even legitimate browsers are allowed to cachedecrypted data to non-volatile storage, leaving copies readable by anyuser with browser access.

Thus, premium content owners are facing a huge challenge in securingtheir content from stealing by hackers while it is being delivered toend user client systems or while it is at rest on client systems.Content delivered as part of progressive media download is exposed incache or disk even if it is delivered over SSL, as SSL terminates at thebrowser leaving the content exposed in the cache. Premium contentdelivered to player runtime needs to be protected all the way to runtimeand while in cache or disk. The approach described herein encrypts thecontent all the way to player runtime and provides an alternative tocustomers who do not require a full DRM solution and its overhead, butstill need to prevent content to be accessed from unauthorized accesswhile it is in transit or at rest.

The technique described herein extends data confidentiality into theplayer application to fend off scalable attacks against the deliveryservice. The basic technique works as follows. In response to a requestfor a content segment received by a server, the server generates andassociates with the segment a special message that confers entitlementto a session-specific key from which one or more decryption keys may bederived. The decryption keys are useful to decrypt the segment atruntime as it is about to be rendered by a player. Before delivery, theserver encrypts the segment to generate an encrypted fragment, and itthen serves the encrypted fragment (and the message) in response to therequest. At the client, information in the message (typically, atoken-protected URL) is used to obtain the session-specific key. Usingthat key, the decryption keys are derived, and those keys are then usedto decrypt the received encrypted fragment. The decryption occurs atruntime. The approach protects content while in transit to and at restin the client browser environment.

Preferably, and as will be described, the approach uses aspects ofsymmetric key encryption, ephemeral keying (Session Key), HMAC-based keyderivation, AUTH module (as defined below) binding, and code obfuscationto protect content from stream rippers and copy of data at rest. Thissolution moves the content protection scheme into the playerapplication, maintaining encryption to the point of playback. Whencontent owner requirements do not call for a DRM, this approach providesa protection solution that allows customers to monetize content with anaffordable, lightweight deterrent to theft and misuse of content.Specifically, the approach protects against: scalable attacks againstthe delivery service, scalable attacks against the encryption key,scalable last-mile content tampering, transport level contentdecryption, and copy at rest in cache. Unlike a traditional DRM system,the approach need not support the specification of policy, or supportfor offline access. Instead, through the use of conditional accessparameters, individual access requests are granted or denied when theclient begins playback. When granted access, the player deciphers a setof bytes that are uniquely protected for the player's security module.

By using a combination of configurable security elements in this manner,the player security module accepts messages in the content stream thatprovide the instructions needed to decrypt the payload. As will bedescribed, these messages are used to produce the keys that protect theclear text content. The message format supports group, per session, andshared keying, along with basic payload encryption (e.g., for certainmedia formats such as Adobe® Flash®) and container encryption (e.g., forcertain other media formats, such as Microsoft® Silverlight® and Apple®iOS®). Unlike a DRM scheme, however, the messages themselves need notcontain keys and they need not carry policy information. Instead,preferably the messages confer entitlement through conditional access toa session-specific key referenced through a URL in the message. In thisway, access to the session key requires a valid edge authenticationtoken, preferably for each segment of content. Along with certaincontrol words, which are allowed in protected portions of the message,the technique allows the system to switch keys more frequently,incurring lower risk of prolonged breach. This is provided through theuse of a set of renewable key derivation functions and metadataconfiguration parameters.

The secure media content protection mechanism of this disclosureprotects against eavesdropping content en route to the player. Ifdesired, end-to-end encryption (such as via SSL) may be used to protectdata from the server egress to the client browser. This encryptionprevents network packet captures from separating the content from theplayer to which it is directed.

FIG. 6 illustrates the basic approach. At step 600, the player isloaded, typically within the client browser or other rendering engine,and makes a request for a content manifest to an edge server running anhttp proxy (ghost) provisioned to provide the desired functionality. Atstep 602, the http proxy that receives the request performs anauthorization check; if the check passes, the proxy builds and servesback to the requesting player a response that includes the manifest. Theplayer then reads the manifest and requests a first fragment of thedesired content. This is step 604. Typically, a “fragment” represents atime slice of contiguous video and audio data of a configurable length,at a discrete bit rate. The player request (generated at step 604) isreceived by the edge server http proxy. The edge server performs anotherauthorization check; if the check succeeds, the edge server proxyencrypts the fragment. This is step 606. At step 608, the edge serverproxy packages the encrypted first fragment with a message, referred toas an entitlement control message (ECM); as will be seen, this messageis used to provide instructions needed to decrypt the remaining payload(namely, the video and/or audio content). In particular, the message isused to produce the keys that protect the clear text content; the ECM,however, need not include the actual keys. Rather, the ECM confersentitlement through conditional access to a session-specific key,preferably references through a URL in the message. (Although notrequired, preferably the manifest and content fragment links (URLs) aretoken-protected). At step 610, the edge server proxy serves theencrypted fragment (with the ECM packaged therewith). At step 612, theplayer reads the ECM from the fragment. The player then makes anotherrequest to the edge server proxy to request the session key. This isstep 614. At step 616, the edge server proxy performs anotherauthorization check; if that check passes, the edge server creates thesession key. The session key is sent to the client at step 618. At step620, the player decrypts the fragment and, at step 622, the player plays(renders) the fragment. This completes the basic operation.

FIG. 7 is a UML diagram illustrating the basic operation of theabove-described scheme for a Flash-based player. In this diagram theFlash player 700 interacts with the edge network server proxy 702(Ghost). The content itself may be available from an edge network datastorage system 704 (NetStorage) after having been uploaded there by thecontent provider 706 (Content Website). Step 1 uploads the content tothe service provider. Step 2 facilitates the token authorization, andstep 3 may involve player verification. Steps 4-15 correspond to thesecure media protection scheme previously described.

ECM and Keys

The following terms have specific meanings in the context of thisdocument.

Ephemeral Key: A key that is temporal in nature.

Session Key: An ephemeral value used to derive one or more encryptionkeys.

Content Encryption Key: An ephemeral key used to protect a temporalinstance of the program data. This key is derived from the Session keyby means of a key derivation function.

ECM Encryption Key: An ephemeral key used to protect the ECM structure.This key is derived from the Session key by means of a key derivationfunction.

Entitlement Control Message (ECM): A data structure used to deliveraccess criteria and encryption parameters from q server/head-end to qclient/player.

Conditional Access: A means of controlling access to a protectedresource based on subscriber/user entitlement status at the time ofplayback.

Generally, the technique involves content encryption rooted in arenewable security module (AUTH) that executes in or in association withthe player. The technique also features lightweight conditional accessfunctionality through a form of entitlement control messaging, payloadencryption, and key management. Preferably, control messages in thisimplementation are carried in a data structure that is versioned andused to carry CDN-specific access criteria (encryption parameters) tothe AUTH module. Preferably, each entitlement control message (ECM) ismuxed with the content it protects, and it is further protected in sucha way that only a provisioned player that is entitled to access thecontent is allowed to decrypt and play the stream.

The major parts of the system are included in the following features ofthe edge server ghost process and a player AUTH module.

Content Scrambling (GHost and Player AUTH)

-   -   Entitlement Control Message (ECM)    -   Payload Encryption    -   Container Encryption

Key Management Service

-   -   Server (GHost)        -   Key Management: Session Key Generator        -   Encryption configuration meta data        -   Key Derivation Functions        -   Session Management    -   Client (Player AUTH)        -   Request for Session Key        -   Key Derivation Functions

As described above with respect to FIG. 6 or FIG. 7, content requestsare handled by the edge server ghost process which, for some forms ofcontent, returns a file describing the segments needed for a VOD file orlive event. Each segment represents a time slice of contiguous video andaudio data of a configurable length, at discrete bit rates. When theplayer is ready to begin playback, it requests a fragment of the file,as identified in the manifest, and then awaits a response with therequested data. When the technique of this disclosure is enabled, theghost process prepares an ECM and, per configuration metadata, encryptsthe response for delivery to the player. Depending on the mode ofoperation, the ECM may be placed within the media container (PayloadEncryption), delivered out-of-band, or prefixed to the encryptedfragment (Content Encryption). Payload encryption scrambles thesensitive codec data, leaving information surrounding the data ofinterest in the clear. This approach is shown in FIG. 8. Preferably, theECM is inserted immediately ahead of the data it protects, using spatialalignment to synchronize ECM with the scrambled data. For segmenteddata, such as Adobe Flash, preferably the ECM is inserted into a customheader that follows the FLV tag header, but precedes the first byte ofencrypted data. This approach is shown in FIG. 9. For native MPEG2Transport Streams, preferably the ECM resides in a separately muxedtransport stream (TS) program using a packet id (PID) defined in aprogram map table (PMT). This approach is shown in FIG. 10. Scrambledpackets typically have the Transport Scrambling Control bits of the TSheader set appropriately. Container encryption treats the data as anopaque blob to which it prefixes and ECM. In this mode, preferably allcontainer information is scrambled along with the program data(audio/video).

FIG. 11 illustrates an embodiment for HTTP dynamic streaming for use inan Adobe® Flash® runtime environment. This embodiment is merelyrepresentative.

To facilitate encryption key management, in one embodiment the edgeserver ghost process uses unique ephemeral keys for each fragment ofcontent delivered to the client. Both the keys and the contentpreferably come from the same instance of the ghost process, andencryption session state preferably will be stored locally. To protectagainst access abuse, an edge tokenization may be used, along withSSL/TLS to protect confidentiality during the actual transport (fromedge server to client).

When a player initiates a segment request, the ghost process thatreceives the request creates an encryption session object to storeephemeral keys and other session context information needed to trackencryption state. These encryption session objects are then stored in aninternal session table, keyed by a session-id string, and retained untilthe end of the session or timeout. Preferably, session objects have alimited lifespan and will naturally induce a renegotiation of thesession key. If the player shares its session id with some other user,token authentication may be used to limit abuses of concurrent access.

As the client parses the ECM, it encounters a URL to the Session Key.This URL typically points to an https resource, and it preferablyincludes a short-lived edge authentication token to protect the resourceagainst a few common forms of unauthorized access.

Within the player, the AUTH module is responsible for parsing the ECM,downloading the appropriate keying material, decrypting the data andpassing it to the codec for playback. Preferably, the AUTH moduleincludes a number of key derivation functions which, when given asession specific Session Key, produce the set of keys needed to decryptand verify data integrity. Preferably, this module is dynamically loadedcode that is or can be downloaded to the client separately from theplayer. This module may be cloaked using obfuscation, and in some cases,can be changed within moments of a breach. Any modification of thismodule induces a change in the player verification signature and playerboot process.

Key derivation functions are a building block of many cryptographicsystems. Given the lack of strong mutual authentication between theclient and server, and the absence of trusted storage on the client,sharing content keys between client and server is undesirable. Tomitigate the risks involved with key exchange over an insecure channelto the untrustworthy client, the described technique preferably uses anHMAC key derivation function with obfuscated secrets that, when giventhe session key (K_SESSION) and other parameters, can compute either theECM key (K_ECM), the content key (K_CONTENT) or the HMAC salt (SALT).The KDF implementation chosen is the HKDF found in Internet RFC 5869.

A preferred implementation (FIG. 8 through FIG. 10 for variousembodiments) defines three functions (KDF1, KDF2, and KDF3) that takethe session key and ECM data. KDF1 and KDF2 are used to derive K_ECM andK_CONTENT, and KDF3 is used to derive SALT for the HMAC.

The preferred approach of this disclosure uses a combination of contentscrambling, key management, and configuration metadata to enable alightweight protection scheme for HTTP capable clients. In this section,the basic requirements of content scrambling are established, theEntitlement Control Message is described in detail and payloadencryption is tied together with this structure.

In most content protection schemes the data of interest (audio/video)and the access control information (access criteria) are storedseparately. While the implementation of access criteria may vary fromone scheme to the next, most include a reference to the encryption key,specific decryption control words/instructions and, in some cases,policy and parental controls such as rating/morality level. Depending onhow the access criteria are protected, the rights they convey, and howthey are delivered, the relevant information may be stored in one of aDRM license, an ECM, a manifest or an index/playlist (m3u8) file. Thechoice of which scheme to use varies with content rights, runtimerequirements and player capabilities. In general, DRM licenses are themost full featured implementation, followed by the ECM, with manifestand index files being the least common in monetized contentdistribution. Factors influencing the decision making process includeperceived value of the content, support for offline playback, hardwaresupport (TPM, secure decode path, secure key ladder, secure id, other),and content owner requirements. The ECM traditionally offers fewerfeatures than a DRM license, at a seemingly lower degree of overallcomplexity. At the same time, the ECM offers resource control to thegranularity of the message spacing interval, while taking fewerresources to generate and consume.

As noted above, the Entitlement Control Message, or ECM, is a datastructure used to deliver access criteria and encryption parameters fromthe server/head-end to the client/player. Preferably, this message isincluded in full with the first response, and repeated in an abbreviatedform during the session. When the client is authorized to play thestream, it acquires the Session Key identified by the ECM, derives thenecessary protection keys (K_ECM and K_CONTENT) and begins decryption.Preferably, the Entitlement Control Message is split into two halves.The first half is accessible to the public and contains information theclient uses to obtain the Session Key, and derive the protection keys.The public section of the structure begins with a version field,followed by a 4 bit message type, a 16 bit KDF version, a 128 bitinitialization vector and ends with a variable length null-terminatedURL string. The second half of the ECM is encrypted using K_ECM, and itcontains parameters used to decrypt the media payload data. The privatesection of the ECM is decrypted using K_ECM, using a cipher of typespecified in 4th bit of the ECM message type, passing the first byte ofthe encrypted CRC16 through the last byte of the encrypted HMAC as inputto the decryption method.

Preferably, every encrypted segment of data will include an ECM. Thefirst response to every request for encrypted data will include an ECMwith public and private sections, as well as a URL to a unique SessionKey. The described approach grants the player conditional access to theSession Key based on validation of the edge authentication token and theresult of any supported origin based authentication. Alternativeimplementations may include support for SAML delegation for integrationwith the DECE, or OAUTH and other SSO technologies for federation withvertical applications.

Encryption parameters from configuration metadata, and other serverinformation, are copied into the ECM by the server and read from the ECMby the client. The client is responsible for obtaining the appropriateSession Key, loading the correct authentication module, and derivingK_ECM needed to decrypt ‘access criteria’ located in the private sectionof the ECM.

In payload encryption mode, only the audio and video data within themedia data (MDAT) box is encrypted. Payload encryption requiresadditional parsing during encryption and decryption, but fewer totalbytes are encrypted per unit. In container encryption mode, all of thebytes from the input media fragment are encrypted, and the ECM is thenprefixed to the beginning of the HTTP response. In this mode ofoperation both the payload (video/audio elementary stream data) and thesurrounding data are encrypted without regard for type. The advantage ofthis mode is that it does not require complex parsing, subsequentlysimpler to develop and maintain.

There are a variety of ciphers to select from for the encrypted data.The most appropriate choice will require a tradeoff between the desiredstrength of the cipher, the temporal nature of the data, and throughputrequirements. In one embodiment, the cipher is AES, although this is nota limitation, as the implementation supports any cipher that fits withinthe runtime environment of the client and server.

Preferably, for every segment, a unique K_ECM and K_CONTENT is derivedusing the corresponding KDF, the working Session Key and additionalinformation contained in the ECM. This helps to avoid the well-knownattack, produces a steady stream of new keys and enables random accessto segments using an infrequent download of the session key.

A Key Management Service (KMS) provides an interface to the keyderivation functions. It contains routines to generate a Session Key,derive protection keys, retrieve message authentication code SALT, andverify the HMAC of a given message. As noted above, preferably theinformation generated on the server side is protected by tokenauthentication and referenced using server side session objects. Theserver implementation of KMS preferably generates strong random numbers,and it provides functions to transform the Session Key into an ECM Key,a Content Encryption Key or SALT. The client implementation of KMSpreferably supports key derivation and an HMAC verifier, and it isimplemented within the Client AUTH Module, and it is loaded at runtimethrough dynamic loading.

Each above-described process preferably is implemented in computersoftware as a set of program instructions executable in one or moreprocessors, as a special-purpose machine.

Representative machines on which the subject matter herein is providedmay be Intel Pentium-based computers running a Linux or Linux-variantoperating system and one or more applications to carry out the describedfunctionality. One or more of the processes described above areimplemented as computer programs, namely, as a set of computerinstructions, for performing the functionality described.

While the above describes a particular order of operations performed bycertain embodiments of the invention, it should be understood that suchorder is exemplary, as alternative embodiments may perform theoperations in a different order, combine certain operations, overlapcertain operations, or the like. References in the specification to agiven embodiment indicate that the embodiment described may include aparticular feature, structure, or characteristic, but every embodimentmay not necessarily include the particular feature, structure, orcharacteristic.

While the disclosed subject matter has been described in the context ofa method or process, the subject matter also relates to apparatus forperforming the operations herein. This apparatus may be a particularmachine that is specially constructed for the required purposes, or itmay comprise a computer otherwise selectively activated or reconfiguredby a computer program stored in the computer. Such a computer programmay be stored in a computer readable storage medium, such as, but is notlimited to, any type of disk including an optical disk, a CD-ROM, and amagnetic-optical disk, a read-only memory (ROM), a random access memory(RAM), a magnetic or optical card, or any type of media suitable forstoring electronic instructions, and each coupled to a computer systembus. A given implementation of the present invention is software writtenin a given programming language that runs in conjunction with aDNS-compliant name server (e.g., BIND) on a standard Intel hardwareplatform running an operating system such as Linux. The functionalitymay be built into the name server code, or it may be executed as anadjunct to that code. A machine implementing the techniques hereincomprises a processor, computer memory holding instructions that areexecuted by the processor to perform the above-described methods.

While given components of the system have been described separately, oneof ordinary skill will appreciate that some of the functions may becombined or shared in given instructions, program sequences, codeportions, and the like.

While given components of the system have been described separately, oneof ordinary skill will appreciate that some of the functions may becombined or shared in given instructions, program sequences, codeportions, and the like. Any application or functionality describedherein may be implemented as native code, by providing hooks intoanother application, by facilitating use of the mechanism as a plug-in,by linking to the mechanism, and the like.

Having described our invention, what we now claim is as follows.

The invention claimed is:
 1. Server apparatus, comprising: a processor;computer memory holding computer program instructions executed by theprocessor to perform content protection by: in response to a request fora segment of content, generating and associating with the segment anentitlement control message (ECM) that confers entitlement to asession-specific key from which one or more decryption keys are adaptedto be derived, the decryption keys being associated with a cryptographicscheme and adapted for use to decrypt the segment by a client player,wherein the ECM includes, in a first public portion, a URL to thesession-specific key, and, in a second private portion, at least oneparameter associated with the cryptographic scheme, the at least oneparameter being encrypted by an ephemeral key used to protect the ECM;encrypting the segment of content to create an encrypted fragment;serving the encrypted fragment and the ECM in response to the request;receiving a request for the session-specific key, the request for thesession-specific key having been issued by the client player followingparsing by the client player of the ECM for the segment; generating thesession-specific key; and returning the session-specific key to theclient player to enable decryption of the segment only as the segment isabout to be rendered by the client player.
 2. The apparatus as describedin claim 1 wherein the segment of content represents a time slice ofcontiguous video and audio data of a configurable length at a discretebitrate.
 3. The apparatus as described in claim 1 wherein the ECM isassociated with the segment of content by placement within a mediacontainer.
 4. The apparatus as described in claim 1 wherein the ECM isassociated with the segment of content by placement within a customerheader preceding a first byte of encrypted fragment.
 5. The apparatus asdescribed in claim 1 wherein the ECM is associated with the segment ofcontent by placement in a distinct muxed transport stream.
 6. Theapparatus as described in claim 1 wherein the computer programinstructions also create an encryption session object to store keys andsession context information to track encryption state of the segment ofcontent.
 7. The apparatus as described in claim 1 wherein thesession-specific key and the one or more decryption keys are external tothe ECM.
 8. The apparatus as described in claim 1 wherein access to thesession-specific key is requires a valid token for the segment ofcontent.
 9. Client apparatus, comprising: a processor; player code;computer memory holding computer program instructions executed by theprocessor to perform content protection by: generating a request for asegment of content; receiving from a server an encrypted fragment, theencrypted fragment having an entitlement control message (ECM)associated therewith, the ECM conferring entitlement to asession-specific key from which one or more decryption keys associatedwith a cryptographic scheme are adapted to be derived, wherein the ECMincludes, in a first public portion, a URL to the session-specific key,and, in a second private portion, at least one parameter associated withthe cryptographic scheme, the at least one parameter being encrypted byan ephemeral key used to protect the ECM; using information in the ECMto issue to the server a request to obtain the session-specific key;receiving from the server the session-specific key, the session-specifickey having been generated at a server in response to receipt at theserver of the request to obtain the session-specific key; deriving, fromthe session-specific key, the one or more decryption keys; decrypting,using the one or more decryption keys, the received encrypted fragment,the decryption occurring only as the segment is about to be rendered bythe player code.